Search CORE

474 research outputs found

Exact and Monte Carlo calculations of integrated likelihoods for the latent class model

Author: Aitkin
Biernacki
Biernacki
C. Biernacki
Celeux
Celeux
Celeux
Dempster
Fraley
Frühwirth-Schnatter
G. Celeux
G. Govaert
Goodman
McLachlan
McLachlan
Nadif
Rand
Robert
Schwarz
Spiegelhalter
Stephens
Publication venue: 'Elsevier BV'
Publication date
Field of study

An Entropy criterion for assessing the number of clusters in a mixture model

Author: Celeux Gilles
Soromenho G.
Publication venue: HAL CCSD
Publication date: 01/04/1993
Field of study

Projet CLORECIn this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performances are investigated through Monte-Carlo numerical experiments and show favourable results as compared with other classical criteria

INRIA a CCSD electronic archive server

Detection of elliptical shapes via cross-entropy clustering

Author: A. Fitzgibbon
A. Samé
C. Fraley
E.R. Davies
G. Celeux
G.J. McLachlan
J. Illingworth
K. Saeed
L. Mirsky
P.D. Mcnicholas
S. Tsuji
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/11/2012
Field of study

The problem of finding elliptical shapes in an image will be considered. We discuss the solution which uses cross-entropy clustering. The proposed method allows the search for ellipses with predefined sizes and position in the space. Moreover, it works well for search of ellipsoids in higher dimensions

arXiv.org e-Print Archive

Crossref

Jagiellonian Univeristy Repository

Localizing the Latent Structure Canonical Uncertainty: Entropy Profiles for Hidden Markov Models

Author: G Brushe
G Celeux
G McLachlan
J-B Durand
J-B Durand
J-B Durand
Jean-Baptiste Durand
M Crouse
O Cappé
PA Devijver
S Lauritzen
T Cover
W Zucchini
Y Ephraim
Y Guédon
Yann Guédon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/02/2012
Field of study

This report addresses state inference for hidden Markov models. These models rely on unobserved states, which often have a meaningful interpretation. This makes it necessary to develop diagnostic tools for quantification of state uncertainty. The entropy of the state sequence that explains an observed sequence for a given hidden Markov chain model can be considered as the canonical measure of state sequence uncertainty. This canonical measure of state sequence uncertainty is not reflected by the classic multivariate state profiles computed by the smoothing algorithm, which summarizes the possible state sequences. Here, we introduce a new type of profiles which have the following properties: (i) these profiles of conditional entropies are a decomposition of the canonical measure of state sequence uncertainty along the sequence and makes it possible to localize this uncertainty, (ii) these profiles are univariate and thus remain easily interpretable on tree structures. We show how to extend the smoothing algorithms for hidden Markov chain and tree models to compute these entropy profiles efficiently.Comment: Submitted to Journal of Machine Learning Research; No RR-7896 (2012

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Agritrop

HAL-CIRAD

Adaptive Seeding for Gaussian Mixture Models

Author: AP Dempster
C Biernacki
C Biernacki
C Bishop
G Celeux
GJ McLachlan
J-P Baudry
JJ Verbeek
JM Geusebroek
R Maitra
R Maitra
TF Gonzalez
V Melnykov
W Kwedlo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/05/2017
Field of study

We present new initialization methods for the expectation-maximization algorithm for multivariate Gaussian mixture models. Our methods are adaptions of the well-known

K

-means++ initialization and the Gonzalez algorithm. Thereby we aim to close the gap between simple random, e.g. uniform, and complex methods, that crucially depend on the right choice of hyperparameters. Our extensive experiments indicate the usefulness of our methods compared to common techniques and methods, which e.g. apply the original

K

-means++ and Gonzalez directly, with respect to artificial as well as real-world data sets.Comment: This is a preprint of a paper that has been accepted for publication in the Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2016. The final publication is available at link.springer.com (http://link.springer.com/chapter/10.1007/978-3-319-31750-2 24

arXiv.org e-Print Archive

Crossref

Enhancing the selection of a model-based clustering with external categorical variables

Author: Amorim M. J.
Baudry J.-P.
Cardoso M. G. M. S.
Celeux G.
Ferreira A. S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.info:eu-repo/semantics/submittedVersio

Repositório Institucional do ISCTE-IUL

Segmental K-Means Learning with Mixture Distribution for HMM Based Handwriting Recognition

Author: B.H. Juang
C.F.J. Wu
G. Celeux
L. Rabiner
T. Zant van der
T.K. Bhowmik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This paper investigates the performance of hidden Markov models (HMMs) for handwriting recognition. The Segmental K-Means algorithm is used for updating the transition and observation probabilities, instead of the Baum-Welch algorithm. Observation probabilities are modelled as multi-variate Gaussian mixture distributions. A deterministic clustering technique is used to estimate the initial parameters of an HMM. Bayesian information criterion (BIC) is used to select the topology of the model. The wavelet transform is used to extract features from a grey-scale image, and avoids binarization of the image.</p

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Bayesian solutions to the label switching problem

Author: A. Jasra
G. Celeux
G. McLachlan
G.K. Gerber
J. Diebolt
J. Geweke
J. Munkres
J.S. Liu
M. Hurn
M. Postman
M. Stephens
R.M. Neal
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2008
Field of study

The label switching problem, the unidentifiability of the permutation of clusters or more generally latent variables, makes interpretation of results computed with MCMC sampling difficult. We introduce a fully Bayesian treatment of the permutations which performs better than alternatives. The method can be used to compute summaries of the posterior samples even for nonparametric Bayesian methods, for which no good solutions exist so far. Although being approximative in this case, the results are very promising. The summaries are intuitively appealing: A summarized cluster is defined as a set of points for which the likelihood of being in the same cluster is maximized

Crossref

Aaltodoc Publication Archive

Mixed mode data clustering: an approach based on tectrachoric correlations

Author: A. Skrondal
B.S. Everitt
B.S. Everitt
C. Manski
C.J. Lawrence
E. Muraki
G. Celeux
J.D. Banfield
J.J. Heckman
J.K. Vermunt
J.K. Vermunt
R.D. Bock
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper we face the problem of clustering mixedmode data by assuming that the observed binary variables aregenerated from latent continuous variables. We perform a principalcomponents analysis on the matrix of tetrachoric correlations and wethen estimate the scores of each latent variable and construct adata matrix with continuous variables to be used in fully Guassianmixture models or in the k-means cluster analysis. The calculationof the expected a posteriori (EAP) estimates may proceed by simplyconsidering a limited number of quadrature points. Results on asimulation study and on a real data set are reported

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Model-Based Clustering and Classification of Functional Data

Author: Breiman L.
Celeux G.
Cormen T. H.
Dempster A. P.
Diebold F.
Ferraty F.
Frühwirth‐Schnatter S.
Hastie T.
Hastie T.
McLachlan G. J.
Raftery A. E.
Titterington D.
Publication venue
Publication date: 01/03/2018
Field of study

The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering techniques or to make predictions of future data via classification (i.e., discriminant analysis) techniques. Latent data models, including mixture model-based approaches are one of the most popular and successful approaches in both the unsupervised context (i.e., clustering) and the supervised one (i.e, classification or discrimination). Although traditionally tools of multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which the individual data units are functions (e.g., curves, surfaces), rather than simple vectors. In many areas of application, the analyzed data are indeed often available in the form of discretized values of functions or curves (e.g., time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data). This functional aspect of the data adds additional difficulties compared to the case of a classical multivariate (non-functional) data analysis. We review and present approaches for model-based clustering and classification of functional data. We derive well-established statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these high-dimensional data, including their heterogeneity, missing information, and dynamical hidden structure. The presented models and algorithms are illustrated on real-world functional data analysis problems from several application area

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref